High-level and Low-level Feature Set for Image Caption Generation with Optimized Convolutional Neural Network
نویسندگان
چکیده
Automatic creation of image descriptions, i.e. captioning images, is an important topic in artificial intelligence (AI) that bridges the gap between computer vision (CV) and natural language processing (NLP). Currently, neural networks are becoming increasingly popular images researchers looking for more efficient models CV sequence-sequence systems. This study focuses on a new caption generation model divided into two stages. Initially, low-level features, such as contrast, sharpness, color their high-level counterparts, motion facial impact score, extracted. Then, optimized convolutional network (CNN) harnessed to generate captions from images. To enhance accuracy process, weights CNN optimally tuned via spider monkey optimization with sine chaotic map evaluation (SMO-SCME). The development proposed method evaluated diversity metrics.
منابع مشابه
A Radon-based Convolutional Neural Network for Medical Image Retrieval
Image classification and retrieval systems have gained more attention because of easier access to high-tech medical imaging. However, the lack of availability of large-scaled balanced labelled data in medicine is still a challenge. Simplicity, practicality, efficiency, and effectiveness are the main targets in medical domain. To achieve these goals, Radon transformation, which is a well-known t...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملImage Caption Generation with Recursive Neural Networks
The ability to recognize image features and generate accurate, syntactically reasonable text descriptions is important for many tasks in computer vision. Auto-captioning could, for example, be used to provide descriptions of website content, or to generate frame-by-frame descriptions of video for the vision-impaired. In this project, a multimodal architecture for generating image captions is ex...
متن کاملA Saliency Detection Model via Fusing Extracted Low-level and High-level Features from an Image
Saliency regions attract more human’s attention than other regions in an image. Low- level and high-level features are utilized in saliency region detection. Low-level features contain primitive information such as color or texture while high-level features usually consider visual systems. Recently, some salient region detection methods have been proposed based on only low-level features or hig...
متن کاملHigh-Level Expectations for Low-Level Image Processing
Scene interpretation systems are often conceived as extensions of low-level image analysis with bottom-up processing for high-level interpretations. In this contribution we show how a generic high-level interpretation system can generate hypotheses and initiate feedback in terms of top-down controlled low-level image analysis. Experimental results are reported about the recognition of structure...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of telecommunications and information technology
سال: 2022
ISSN: ['1509-4553', '1899-8852']
DOI: https://doi.org/10.26636/jtit.2022.164222